Data Management
Overview
Archiving will now be accomplished using Elasticsearch Lifecycle Management (LSM) which is used to automatically manage rolling indices. It has four main phases which may be used as a part of the management: Hot, Warm, Cold, and Delete.
- Hot indices are currently being written to.
- Warm indices are indices that have reached a pre-defined rollover criteria (e.g. size or age) and may be marked as read-only.
- Cold phase indicates the data is old enough that it may be left out of general searches and is no longer searchable by aliases.
- Delete is when the data may finally be removed. Indices Managed LSM should use a configured alias for both read and write.
Only data in the Cold phase should ever be queried using a specific index name.
Developer Notes (Related to Changes)
We are strategically aligning our Data Management strategy with Elasticsearch making future upgrades easier.
LSM Structure
Indices
- Executionsummary
- Worksheet
- Process request
- Taskresult
Naming Convention for Indices
Be aware that the name of the rolling indices has changed for LSM Management. The old format used the current day for daily rolling, e.g. worksheet-20201101, worksheet-20201102, etc. The new format uses an incrementing number as the rollovers occur, e.g. worksheet-000001, worsheet-000002, etc..
General Requirements
The following is the list of the required hardware components:
- Server Storage—Depends on usage: 500 GB minimum, 750 GB recommended. It is highly recommended to review Sizing Considerations.
- Database Storage—Depends on usage, 20+ GB recommended.
Configuration
Blueprint Settings
Blueprint Property | Default Value | Description |
---|---|---|
rssearch.shards | Configures the number of Elasticsearch shards per index. It is highly recommended that this value is set to 1. | 1 |
rssearch.lsm.rolloversize | Hot Phase setting. How many days old an index must be before rolling over. The value should be set such that the largest index (usually task results) never goes above 20GB before rolling over. The value may also be set to a size rollover by removing the “d” from the configuration (e.g. rssearch.lsm.rolloversize=20). note For high volume environments, this value may need to be adjusted. See Considerations for High-Volume Environments for more information. | 7d |
rssearch.lsm.readonlydays | Warm Phase setting. How long, in days, after the index has rolled over before it is marked as read-only. If there are any runbooks using events that may go over 1 day this would need to be increased. May be set to 0 to turn off this phase. note If using automation that takes longer than | 1 |
rssearch.lsm.freezedays | Cold Phase setting. How long, in days, after the previous phase the index is frozen. A frozen index will not be searched using an alias thereby reducing memory usage. Freezing the index requires a close and reopen during which the cluster will be “red”. So on high volume systems this may need to be turned off by setting the value to 0. | 30 |
rssearch.lsm.deletedays | Absolute Retention. How long, in days, after the previous phase the index is deleted. note This is our recommended value. If a longer retention period is required, see Considerations for Longer Retention Period. | 365 |
Stand-Alone
No Additional settings required for a stand-alone environment.
Cluster
Blueprint settings must match across all RSSearch nodes.
Configuration Scenarios
Each Phase only begins its clock once the previous phase has finished and it is based on the whole index.
Hot Phase
Active indices that are currently being written to. Choosing the size value over days value (research.lsmrolloversize=20). This value is incremented in terms of GB. This setting is primarily used for extremely active environments or scenarios where automation results require a lot of storage. This setting will also cause the ActionTask, Worksheet, Processrequest index to go out of sync due to the difference in the storage requirements for each Index for each automation result. Please note that older worksheet results may disappear from GUI with this setting.
Warm Phase
- This is the transition phase from actively written indices to read only. Indices could still be written to, but the rssearch.lsm.readonlydays will mark the absolute time the index can be written to.
- Example, if an automation takes 15 days to complete. The suggested value of rssearch.lsm.readonlydays would be 15.
Cold Phase
Actions Pro is leveraging this phase to streamline the performance of the Platform. Readable data from the GUI will have memory requirements. This will vary from environment to environment and will need to sized accordingly. The current default is 30 days and can be adjusted based on usage. This means that only 30 days of worksheets are available through the Worksheets dashboard. Data that needs to be queried beyond this period will require an external tool (curl, Kibana, etc).
Delete Phase
This is the absolute storage setting for Resolve Actions Pro. Data older than the retention period will be purged.
Scenarios
90 days of worksheet access. Some Automations last 5 days. Retention requirement of 2 years.
- rssearch.lsm.rolloversize=7
- rssearch.lsm.readonlydays=5
- rssearch.lsm.freezedays=85*
It is 85 days and not 90, because it’s the number of days after warmphase rollover.
Depending on how active/busy the platform is, there potentially may be some memory constraint issues with this setting
- rssearch.lsm.deletedays=731 (It is recommended to add a 1 day buffer)
30 days of worksheet access. Some automations last 2 days. Retention requirement of 180 days.
- rssearch.lsm.rolloversize=7
- rssearch.lsm.readonlydays=2
- rssearch.lsm.freezedays=28*
Depending on how active/busy the platform is, there potentially may be some memory constraint issues with this setting.
- rssearch.lsm.deletedays=181 (It is recommended to add a 1 day buffer)
Changes to Worksheet Data Retention
To maintain Actions Pro worksheets dashboard efficiency, Actions Pro has adopted the Cold Phase storage model in which data that is deemed old enough rssearch.lsm.freezedays
will be left out of general searches and is no longer searchable by aliases.
Sizing Considerations
Actions Pro environments differs from one to the next (Prod, UAT, Dev, Staging, Test, etc) also from one customer to the next. This also will change as Retention policies will differ accordingly. Actions Pro makes the following recommendation.
Determine the usage type of the Installation (Prod/Dev)
Determine the usage and Retention policy
- Small amount of execution, but querying a large amount of data?
- A Large amount of executions, but light on response data?
- A different combination?
- Long retention policy (2 years), Longer retention policy (7 years?)
Sizing
Monitor disk usage using the following metric
- 1 Day, 7 day, 1 month, 3 month, 6 months, 1 year
- Track and trend the growth of the disk usage based on the above metrics
- Add a buffer for unforeseen circumstances
- This will help determine a close approximation for storage requirements
Baselining Storage usage from Pre-existing Installations
Database Usage
MYSQL
SELECT TABLE_NAME, ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024)
AS 'SIZE (MB)'
FROM information_schema.TABLES
WHERE TABLE_SCHEMA="resolve"
AND (DATA_LENGTH + INDEX_LENGTH)/1024/1024 >= 1
AND TABLE_NAME LIKE '%ARCHIVE_%'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
Example output:
Expected tables archive_action_result_lob, archive_action_result, archive_process_request, archive_worksheet, archive_execute_request, archive_execute_result, archive_execute_dependency, archive_execute_result_lob, archive_worksheet_debug, archive_worksheet_data
Oracle
select SEGMENT_NAME as “Object Name”, tablespace_name as “Tablespace”, round(bytes/1024/1024,2) as "Object Size (Mb)"
from user_segments
WHERE owner=’<Actions Pro Schema Name>’ and SEGEMENT_NAME=’TABLE’;
Elasticsearch Usage
- Versions 5.x and 6.x:
curl -XGET "https://<ES URL>:9200/_cat/indices?v&s=store.size:desc"
- Version 7x and later:
curl -XGET -u elastic_login:elastic_password -k https://<ES URL>:9200/_cat/indices?v&s=store.size:desc
Example output:
Sizing
Use the information discovered above with Database “Archive*” tables and the aggregate size of the “pri.store.size” to add to the retention size requirement for the Actions Pro filesystem
Example:
Using the figures above, 91mb from the database and 1489mb from Elasticsearch. The general recommended setting may be fine for the short term, but notice the sudden growth trend due to a new automation that was recently added on April 20th (taskresult_20210420). It is recommended to monitor this trend over 1 day, 1 week, 1 month, 1 quarter period to determine the right sizing for the intended environment.
Elasticsearch Backend Changes
- All reads and writes to Elasticsearch should now be done exclusively through the aliases, executionsummaryalias, processrequestalias, taskresultalias, and worksheetalias. If there are any areas of code found that are still trying to use a specific index name, they will need to be fixed to use the alias.
- The creation of the LSM rules is done using a new connector, org.elasticsearch.client.RestHighLevelClient. Elasticsearch will be deprecating the older Client in a future release so over time all back end functionality will need to be switched over to this new client.
Updating the Elasticsearch SSL Certificate
To apply new SSL certificates to Elasticsearch within Resolve:
- Obtain New Certificates
- Download the new Elasticsearch certificates from the Resolve Support Portal.
- Internal authored certificates may be used in place of the Resolve-issued certificates.
- Prepare for Installation
- Unzip the downloaded certificate files.
- Locate the Config Directory
- Find the Elasticsearch config cert folder on each core server where Elasticsearch was installed.
- Apply the New Certificates
- Replace the existing three certificate files in the Elasticsearch config cert folder with the new ones you downloaded.
- Repeat this step on every core server, even those not running Elasticsearch.
- Restart the Cluster
- Stop all Resolve services on all core servers.
- Start all Resolve services on all core servers.
- Confirm that all components, including RSControl, have restarted successfully.
- Verify the Installation (Optional)
- To check that the updated certificates are applied, run the following command:
curl -XGET -k -u elastic:<ES_PASS> -v
- To check that the updated certificates are applied, run the following command:
- Monitor for Issues
- Check that all components, especially those connected to ElasticSearch, function correctly.
Important Notes
- Restart all components connecting to Elasticsearch to load the new certificates.
- If you see any warning messages about certificate trust, ensure all services have fully restarted.
- Some components, like RSControl, may take longer to restart fully.
Non-LSM Change
While making the changes for LSM management of the rolling indices, some changes were made to the static indices as well.
- Index creation now uses templates instead of mapping files. The startup will create/update the various templates so index creation should now simply supply the name, and the template will take care of the rules.
- The static indies have changed their names to include the major ES version. E.g. actiontask-7.
- Access to static indices, read and write, should be done through an alias. E.g. actiontaskalias.
- If there are any places still attempting to write to the old static index names this will need to be fixed
The reason for these changes is to make migrating the data in the static indices easier for future major ES version releases. So for example after updating to ES 8 the new write indices for the static indices will be the -8 version, and before updating to ES 9 any -7 indices will need to be migrated or removed since ES would no longer support indices created on that version.
Considerations for High-Volume Environments
Sizing for high volume environments, it is important to review existing taskresult trends and comparing it to blueprint property rssearch.lsm.rolloversize
. If 7 days worth (default value) of taskresults can possibly grow beyond 20gb, reducing the rolloversize accordingly is advised. Please contact support if daily taskresults is approaching or beyond 20gb.
Considerations for Long-Running Automations
Some environments may have long running automations that will exceed the normal hot phase time period. These rolled-over indices will reside in the warm phase. The blueprint property rssearch.lsm.readonlydays
controls when the warm phase indices will be marked for preparation cold phase storage (no longer viewable through the UI). The rssearch.lsm.freezedays
property is the blueprint value which will rollover the warm phase index to cold phase. Users should consider the maximum possible duration of the automation and adjust this value appropriately.
Example:
If an automation runs for 30 days, this rssearch.lsm.readonlydays
will need to be set for 30.
Considerations for Longer Retention Period
Some users need to see historical automation results from the past month or quarter. While it’s preferable to see this information through Actions Pro UI, having this option is a resource burden (Memory/CPU) for the product. Kibana, curl, or another advanced Rest query tool is highly recommended to pull this information.
If it’s absolutely necessary to have worksheet information through the Action’s UI, monitoring the CPU and memory usage of Action’s is highly recommended.
The rssearch.lsm.freezedays
is the blueprint value which will rollover the warm phase index to cold phase. This value sets the countdown in days for which an index will be rolled-over.